Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1107-1116, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-28574365

RESUMO

High-throughput sequencing techniques have generated massive quantities of genotype data. Haplotype phasing has proven to be a useful and effective method for analyzing these data. However, the quality of phasing is undermined due to missing information. Imputation provides an effective means of improving the underlying genotype information. For model organisms, imputation can rely on an available reference genotype panel and a physical or genetic map. For non-model organisms, which often do not have a genotype panel, it is important to design an imputation technique that does not rely on reference data. Here, we present Accurate Data-Driven Imputation Technique (ADDIT), which is composed of two data-driven algorithms capable of handling data generated from model and non-model organisms. The non-model variant of ADDIT (referred to as ADDIT-NM) employs statistical inference methods to impute missing genotypes, whereas the model variant (referred to as ADDIT-M) leverages a supervised learning-based approach for imputation. We demonstrate that both variants of ADDIT are more accurate, faster, and require less memory than leading state-of-the-art imputation tools using model (human) and non-model (maize, apple, and grape) genotype data. Software Availability: The source code of ADDIT and test data sets are available at https://github.com/NDBL/ADDIT.


Assuntos
Biologia Computacional/métodos , Técnicas Genéticas , Genótipo , Algoritmos , Genômica/métodos , Técnicas de Genotipagem , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Malus/genética , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Software , Vitis/genética , Zea mays/genética
2.
AMIA Annu Symp Proc ; 2019: 313-322, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308824

RESUMO

Using electronic health data to predict adverse drug reaction (ADR) incurs practical challenges, such as lack of adequate data from any single site for rare ADR detection, resource constraints on integrating data from multiple sources, and privacy concerns with creating a centralized database from person-specific, sensitive data. We introduce a federated learning framework that can learn a global ADR prediction model from distributed health data held locally at different sites. We propose two novel methods of local model aggregation to improve the predictive capability of the global model. Through comprehensive experimental evaluation using real-world health data from 1 million patients, we demonstrate the effectiveness of our proposed approach in achieving comparable performance to centralized learning and outperforming localized learning models for two types of ADRs. We also demonstrate that, for varying data distributions, our aggregation methods outperform state-of-the-art techniques, in terms of precision, recall, and accuracy.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Bases de Dados Factuais , Humanos , Modelos Logísticos , Máquina de Vetores de Suporte
3.
Sci Rep ; 8(1): 9936, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29967328

RESUMO

Second-generation DNA sequencing techniques generate short reads that can result in fragmented genome assemblies. Third-generation sequencing platforms mitigate this limitation by producing longer reads that span across complex and repetitive regions. However, the usefulness of such long reads is limited because of high sequencing error rates. To exploit the full potential of these longer reads, it is imperative to correct the underlying errors. We propose HECIL-Hybrid Error Correction with Iterative Learning-a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read alignments. We demonstrate that HECIL outperforms state-of-the-art error correction algorithms for an overwhelming majority of evaluation metrics on diverse, real-world data sets including E. coli, S. cerevisiae, and the malaria vector mosquito A. funestus. Additionally, we provide an optional avenue of improving the performance of HECIL's core algorithm by introducing an iterative learning paradigm that enhances the correction policy at each iteration by incorporating knowledge gathered from previous iterations via data-driven confidence metrics assigned to prior corrections.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aprendizado de Máquina , Análise de Sequência de DNA/métodos , Escherichia coli/genética , Mosquitos Vetores/genética , Sequências Repetitivas de Ácido Nucleico , Saccharomyces cerevisiae/genética
4.
BMC Genomics ; 18(1): 417, 2017 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-28558688

RESUMO

BACKGROUND: Restriction site associated DNA sequencing (RADseq) has the potential to be a broadly applicable, low-cost approach for high-quality genetic linkage mapping in forest trees lacking a reference genome. The statistical inference of linear order must be as accurate as possible for the correct ordering of sequence scaffolds and contigs to chromosomal locations. Accurate maps also facilitate the discovery of chromosome segments containing allelic variants conferring resistance to the biotic and abiotic stresses that threaten forest trees worldwide. We used ddRADseq for genetic mapping in the tree Quercus rubra, with an approach optimized to produce a high-quality map. Our study design also enabled us to model the results we would have obtained with less depth of coverage. RESULTS: Our sequencing design produced a high sequencing depth in the parents (248×) and a moderate sequencing depth (15×) in the progeny. The digital normalization method of generating a de novo reference and the SAMtools SNP variant caller yielded the most SNP calls (78,725). The major drivers of map inflation were multiple SNPs located within the same sequence (77% of SNPs called). The highest quality map was generated with a low level of missing data (5%) and a genome-wide threshold of 0.025 for deviation from Mendelian expectation. The final map included 849 SNP markers (1.8% of the 78,725 SNPs called). Downsampling the individual FASTQ files to model lower depth of coverage revealed that sequencing the progeny using 96 samples per lane would have yielded too few SNP markers to generate a map, even if we had sequenced the parents at depth 248×. CONCLUSIONS: The ddRADseq technology produced enough high-quality SNP markers to make a moderately dense, high-quality map. The success of this project was due to high depth of coverage of the parents, moderate depth of coverage of the progeny, a good framework map, an optimized bioinformatics pipeline, and rigorous premapping filters. The ddRADseq approach is useful for the construction of high-quality genetic maps in organisms lacking a reference genome if the parents and progeny are sequenced at sufficient depth. Technical improvements in reduced representation sequencing (RRS) approaches are needed to reduce the amount of missing data.


Assuntos
Mapeamento Cromossômico/métodos , Enzimas de Restrição do DNA/metabolismo , Quercus/genética , Análise de Sequência de DNA , Técnicas de Genotipagem , Polimorfismo de Nucleotídeo Único
5.
Mol Biol Rep ; 40(2): 1103-25, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23086300

RESUMO

Biochemical networks comprise many diverse components and interactions between them. It has intracellular signaling, metabolic and gene regulatory pathways which are highly integrated and whose responses are elicited by extracellular actions. Previous modeling techniques mostly consider each pathway independently without focusing on the interrelation of these which actually functions as a single system. In this paper, we propose an approach of modeling an integrated pathway using an event-driven modeling tool, i.e., Petri nets (PNs). PNs have the ability to simulate the dynamics of the system with high levels of accuracy. The integrated set of signaling, regulatory and metabolic reactions involved in Saccharomyces cerevisiae's HOG pathway has been collected from the literature. The kinetic parameter values have been used for transition firings. The dynamics of the system has been simulated and the concentrations of major biological species over time have been observed. The phenotypic characteristics of the integrated system have been investigated under two conditions, viz., under the absence and presence of osmotic pressure. The results have been validated favorably with the existing experimental results. We have also compared our study with the study of idFBA (Lee et al., PLoS Comput Biol 4:e1000-e1086, 2008) and pointed out the differences between both studies. We have simulated and monitored concentrations of multiple biological entities over time and also incorporated feedback inhibition by Ptp2 which has not been included in the idFBA study. We have concluded that our study is the first to the best of our knowledge to model signaling, metabolic and regulatory events in an integrated form through PN model framework. This study is useful in computational simulation of system dynamics for integrated pathways as there are growing evidences that the malfunctioning of the interplay among these pathways is associated with disease.


Assuntos
Simulação por Computador , Proteínas Quinases Ativadas por Mitógeno/fisiologia , Modelos Biológicos , Proteínas de Saccharomyces cerevisiae/fisiologia , Saccharomyces cerevisiae/fisiologia , Retroalimentação Fisiológica , Regulação Fúngica da Expressão Gênica , Redes Reguladoras de Genes , Redes e Vias Metabólicas , Pressão Osmótica , Monoéster Fosfórico Hidrolases/fisiologia , Transdução de Sinais , Estresse Fisiológico , Equilíbrio Hidroeletrolítico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...